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Abstract 

During voiced speech, the human vocal folds interact with the vocal 
tract acoustics. The resulting glottal source-resonator coupling has been 
observed using mathematical and physical models as well as in in vivo 
phonation. We propose a computational time-domain model of the full 
speech apparatus that, in particular, contains a feedback mechanism from 
the vocal tract acoustics to the vocal fold oscillations. It is based on 
numerical solution of ordinary and partial differential equations dehned 
on vocal tract geometries that have been obtained by Magnetic Resonance 
Imaging. The model is used to simulate rising and falling pitch glides of 
[a, i] in the fundamental frequency (/o) interval [150Hz,320Hz]. The 
interval contains the first vocal tract resonance fm and the first formant 
Fi of [i] as well as the fractions of the hrst resonance fml^ and fm/S of 
[a]. 

The simulations reveal a locking pattern of the /o-trajectory at fm 
of [i] in falling and rising glides. The resonance fractions of [a] produce 
perturbations in the pressure signal at the lips but no locking. All these 
observations from the model behaviour are consistent and robust within 
a wide range of feasible model parameter values and under exclusion of 
secondary model components. 

Index Terms: Speech modelling, vocal fold model, flow induced vibrations, 
modal locking. 
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1 Introduction 


The classical source-filter theory of vowel production is built on the assumption 
that the sonrce (i.e., the vocal fold vibration) operates independently of the 
filter (i.e., the vocal tract, henceforth VT) whose resonances modnlate the re- 
snlting vowel sound [1, 2]. Even thongh this approach captnres a wide range of 
phenomena in speech prodnction, at least in male speakers, some observations 
remain nnexplained by the sonrce-filter model lacking feedback. The purpose 
of this article is to deal with some of these observations nsing compntational 
modelling. 

More precisely, simulated rising and falling frequency glides of vowels [a] 
and [i] over the frequency range [150 Hz, 320 Hz] are considered. Similar glides 
recorded from eleven female test snbjects are treated in the companion article 
[3]. Snch a vowel glide is particularly interesting if its glottal freqnency (/o) 
range intersects an isolated acoustic resonance of the snpra- or snbglottal cavity, 
which we here assnme to correspond to the lowest formant Fi. Since Fi almost 
always lies high above /o in adnlt male phonation, this situation occnrs typically 
in female snbjects and only when they are producing vowels snch as [i] with low 
Fi. As reported below, simnlations reveal (in addition to other observations) a 
characteristic locking behaviour of fo at the VT acoustic resonance fm « Fi. 
To check the robnstness of the model observations, secondary featnres of the 
model and the role of nnmodelled physics are discnssed at the end of the article. 

As a matter of fact, this article has two eqnally important objectives. Firstly, 
we pursne better understanding of the time-domain dynamics of glottal pnlse 
perturbations near fm of [i] and at other acoustic “hot spots” of the VT and 
the snbglottal system within [150 Hz, 320 Hz] that may be reached in speech or 
singing. An aconstic and flow-mechanical model of the speech apparatns is a 
well-snited tool for this purpose. Secondly, we introdnce and validate a compu¬ 
tational model that meets these reqnirements. The proposed model has been 
originally designed to be a glottal pnlse source for high-resolution 3D compnta¬ 
tional aconstics model of the VT which is being developed for medical pnrposes 
[5]. There is an emerging application for this model as a development platform 
of speech signal processing algorithms such as discussed in [6], [7] and [8]; how¬ 
ever, the model introduced in [9] has been nsed in [8]. Since perturbations of 
fo near Fi are a widely researched, yet quite multifaceted phenomenon, it is a 
good candidate for model validation experiments. 

The simulations carried out in this paper indicate special kinds of perturba¬ 
tions in vocal folds vibrations near a VT resonance as reported below. The mere 
existence of snch pertnrbations is not snrprising considering the wide range of 
existing literatnre. Since the seminal work of [10], a wide range of glottal source 
perturbation patterns related to aconstic loading has been investigated. Ex¬ 
periments were carried out in [11] on excised larynges monnted on a resonator 
to determine how glottal amplitnde ratio changes with the snbglottal resonator 
length. Physical models were used in [12] with a snbglottal resonator to stndy 
phonation onsets and offsets, and in [13] with snb- and snpraglottal resonators 

The VT resonances fRi,fR 2 ,--- are understood here as purely mathematical objects, 
determined by an acoustic PDE and its boundary conditions that are defined on the VT 
geometry. Formants Fi, F 2 ,... refer to respective frequency peaks extracted from natural or 
simulated speech. Here, the notation of [4] is used to differentiate the two although, of course, 
we expect to have /rj ps Fj for j = 1,2,.... 
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to study phonation onsets. The latter also considered the dynamics of frequency 
jumps as the natural frequency of their physical model was varied over time. 
Similarly, a physical model of phonation with tubular, variable length supraglot- 
tal resonator was studied in [14, 15], and it was used to validate a flow-acoustic 
model somewhat resembling the one proposed in this article. 

In [16] the problem was approached using both reasoning based on sub- and 
supraglottal impedances and a non-computational flow model as well as com¬ 
putational model comprising a multi-mass vocal fold model and wave-reflection 
models of the subglottal and supraglottal systems. A two-mass model of vo¬ 
cal folds, coupled with a variable-length resonator tube, was used in [17], and 
pitch glides were simulated using a four-mass model to analyse the interactions 
between vocal register transitions and VT resonances in [18]. 

These works reveal a consistent picture of the existence of perturbations 
caused by resonant loads, and this phenomenon has also been detected experi¬ 
mentally in [19] using speech recordings and in [20] using simultaneous record¬ 
ings of laryngeal endoscopy, acoustics, aerodynamics, electroglottography, and 
acceleration sensors. The latter article also contains a review on related voice 
bifurcations. 

Although the existence of these perturbations has been well reported, speech 
modelling studies have given only limited attention to the time-domain dynam¬ 
ics of fundamental frequency glides where such perturbations would be expected 
to occur. Of the above mentioned studies, upward glides were simulated in [13] 
by varying the natural frequency of their physical model over time. Their small 
amplitude oscillation model exhibited a frequency jump in the vicinity of the 
resonance of their downstream tube when the load resistance was sufficiently 
strong. Downward glides were simulated in [16] followed by upward glides by 
varying the parameters of a multi-mass vocal fold model. Frequency jumps, 
subharmonics and amplitude changes were observed in the regions where load 
reactances were changing rapidly. Changes in the rate of change of the funda¬ 
mental frequency in these regions can also be seen in their Figures 10-14. In 
[18] upward glides were simulated followed by downward glides by adjusting the 
tension parameter (i.e. decreasing masses and increasing stiffness parameters by 
the same factor) in their four-mass vocal fold model. They observed frequency 
jumps associated with register changes, which in turn were shown to occur at 
different frequencies depending on the vocal tract load. 

Some of the most popular approaches to modelling phonation are based 
on the Kelly-Lochbaum vocal tract [21] or various transmission line analogues 
[22, 23, 24] . Contrary to these approaches, the proposed model consists of 
(ordinary and partial) differential equations, conservation laws, and coupling 
equations. In this modelling paradigm, the temporal and spatial discretisation 
is conceptually and practically separated from the actual mathematical model 
of speech. The computational model is simply a numerical solver for the model 
equations, written in MATLAB environment. The modular design makes it 
easy to decouple model components for assessing their significance to simulated 
behaviour. Since the generalised Webster’s equation for the VT acoustics as¬ 
sumes intersectional area functions as its geometric data, VT configurations 
from Magnetic Resonance Imaging (MRI) can be used without transcription 

Some economy of modelled features should be maintained to prevent various forms of 
“overfitting” while explaining the experimental facts. Good modelling practices within math¬ 
ematical acoustics have been discussed in Chapter 8 in [25]. 
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to non-geometric model parameters. Thus, time-dependent VT geometries are 
easy to implement. Further advantages of speech modelling based on Webster’s 
equation have been explained in [26] where the approach is somewhat similar 
to one taken here. 

The proposed model aims at qualitatively realistic functionality, tunability 
by a low number of parameters, and tractability of model components, equa¬ 
tions, and their relation to biophysics. Similar functionality in higher precision 
can be obtained using Computational Fluid Dynamics (CFD) with elastic tissue 
boundaries. In the CFD approach, the aim is to model the speech apparatus 
as undivided whole [27], but the computational cost is much higher compared 
to our model or the models proposed in, e.g., [26] and [28]. The numerical effi¬ 
ciency is a key issue because some parameter values or their feasible ranges (in 
particular, for hard-to-get physiological parameters) can only be determined by 
the trial and error method as discussed in Chapter 4 in [29], leading to a high 
number of required simulations. 


2 Model of the Vocal Folds 

2.1 Anatomy, physiology, and control of phonation 

All voiced speech sounds originate from self-sustained quasi-periodic oscillations 
of the vocal folds where the closure of the aperture — known as the rima glottidis 
— between the two string-like vocal folds cuts off the air flow from lungs. This 
process is called phonation, and the system comprising the vocal folds and the 
rima glottidis is known as the glottis. A single period of the glottal flow produced 
by phonation is known as a glottal pulse. A description of structures in human 
larynx and their function can be found, e.g., in [30] or [31], and we give here 
only a brief summary. 

As shown in Figure 1 (upper left panel), each vocal fold consists of a vo¬ 
cal ligament (also known as a vocal cord) together with a medial part of the 
thyroarytenoid muscle, and the vocalis muscle (not specified in upper left panel 
of Figure 1). Left and right vocal folds are attached to the thyroid cartilage 
from their anterior ends and to the respective left and right arytenoid cartilages 
from their posterior ends. In addition, there is the fourth, ring-formed cricoid 
cartilage whose location is inferior to the thyroid cartilage. The vocal folds and 
the associated muscles are supported by these cartilages. 

There are two muscles attached between each of the arytenoid cartilages 
and the cricoid cartilage: the posterior and the lateral cricoarytenoid muscles 
whose mechanical actions are opposite. The vocal folds are adducted by the con¬ 
traction of the lateral cricoarytenoid muscles during phonation, and conversely, 
abducted by the posterior cricoarytenoid muscles during, e.g., breathing. This 
control action is realised by a rotational movement of the arytenoid cartilages 
in a transversal plane. In addition, there is a fifth (unpaired) muscle — the 
arytenoid muscle — whose contraction brings the arytenoid cartilages closer to 
each other, thus reducing the opening of the glottis independently of the lateral 
cricoarytenoid muscles. These rather complicated control mechanisms regulate 
the type of phonation in the breathy-pressed scale. 

The main mechanism controlling the fundamental frequency fo of voiced 
speech sound is actuated by two cricothyroid muscles (not visible in upper left 
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Figure 1: The topmost panel on left: Sketch of the anatomy of the larynx seen 
from above according to [32]. Right panel: The geometry of the glottis model 
and the symbols nsed. The trachea (i.e., the channel leading from the Inngs 
to glottis) is to the left in this sketch and the vocal tract is to the right. The 
lower panel on left: Lnmped-element representation of the glottis model with 
two degrees of freedom shown for the lower vocal fold (j = 1). 
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panel of Figure 1). The contraction of these muscles leads to a rotation of the 
thyroid cartilage with respect to the cricoid cartilage. As a result, the thyroid 
cartilage inclines to the anterior direction, thus stretching the vocal folds. The 
elongation of the string-like vocal folds leads to increased stress which raises 
the fundamental frequency /<, of their longitudinal vibrations. The vertical 
movement of larynx also rotates cricoid cartilage impacting fo- Finally, the 
phonation and fo are influenced by subglottal pressure through the control of 
respiratory muscles. 

2.2 Glottis model 

The anatomic configuration in upper left panel of Figure 1 is idealised as a low- 
order mass-spring system with aerodynamic surfaces as shown in right panel 
of Figure 1 and discussed in [33] and [29]. Such lumped-element models have 
been used frequently (see, e.g., [34], [35], [36], [37], [15], and [38]) since the 
introduction of the classic two-mass model [10[. For recent reviews of the variety 
of lumped-element and PDF based models and their applications, see [39], [40] 
and [41] . 

The radically simplified glottis model geometry in Figure 1 (right panel) 
corresponds to the coronal section through the center of the vocal folds. Both 
the fundamental frequency fo as well as the phonation type can be chosen by 
adjusting parameter values (see Section 4 in [29]). Register shifts (e.g., from 
modal register to falsetto) are not in the scope of this model since it would 
require either modelling the vocal folds as aerodynamically loaded strings or as 
a high-order mass-spring system that has a string-like “elastic” behaviour. 

The vocal fold model in Figure 1 (right panel) consists of two wedge-shaped 
moving elements that have two degrees of freedom each: each end of the vocal 
fold can move in the y-direction. Although this causes some distortion to the 
shape of the wedges, the displacements are small enough that this effect is 
negligible. The distributed mass of these elements is reduced into three mass 
points which, for the j**' fold, j = 1,2, are located so that rriji is at a: = L, 
mj 2 at X = 0, and rrijs at x = L/2. Here L denotes the thickness of the 
modelled vocal fold structures. In calculation of the masses, the reduced system 
retains the mass, and static and inertial moments of a more realistic vocal fold 
shape (for details, see [33] p. 14). The elastic support of the vocal ligament is 
approximated by two springs at points x = liL and x = I 2 L. The equations of 
motion for the vocal folds are given by 

f MilFi(t) + HilFi(t) + iFiVFi(t) =Fi(t), 

\ M2W2{t) + B2W2{t) + K2W2{t) = F2(t), t S K, 

where Wj = [wji ^ are the displacements of rriji and mj 2 in the y- 

direction as shown in Figure 1 (lower left panel). The loading force pair Fj{t) = 

[F,i{t) F,2{t)]^ is due to aerodynamic and acoustic pressure forces in Eq. (8) 

when the glottis is open, and the collision forces in Eq. (4) when the vocal folds 
are in contact. The respective mass, damping, and stiffness matrices Mj, Bj, 
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and Kj in (1) are 


Mj = 




4 


^j3 

4 


and Kj = 


^j3 

4 


mj2 + 

l!kji 4 

hhikji 


4 


l^k 


2«-j2 


= 

bji 

0 


0 

bj2 _ 

hhikji + kj2) 


) l^kji +1 

lkj2 



(2) 


The entries of these matrices have been computed by means of Lagrangian 
mechanics in [33]. The damping matrices Bj are diagonal since the dampers 
are located at the endpoints of the vocal folds. The model supports asymmetric 
vocal fold vibrations but for this work symmetry is imposed on the vocal folds 
by using parameters M = Mj, K = Kj, and B = Bj, j = 1,2, and by setting 
F{t) = F 2 {t) = —Fi{t). The parameters in equation (2) as well as the loading 
force components in equation (1) are illustrated in Figure 1 (right and lower left 
panels). 

The glottal openings at the two ends of the vocal folds, denoted by AWi, 
i = 1,2, are related to equations (1) through 


AfFi 

AfFa 


= W2-Wi + 


9 

Ho 


(3) 


where the rest gap parameters g and FIq are as in Figure 1 (right panel). In 
human anatomy, the parameter g is related to the position and orientation of 
the arytenoid cartilages. 

As is typical in related biomechanical modelling [34, 42, 18[, the lumped pa¬ 
rameters of the mass-spring system (1)- (2) are in some correspondence to the 
true masses, material parameters, and geometric characteristics of the sound 
producing tissues. More precisely, matrices M correspond to the vibrating 
masses of the vocal folds, including the vocal ligaments together with their 
covering mucous layers and (at least, partly) the supporting vocalis muscles. 
The elements of the matrices K are best understood as linear approximations 
of k{s) = f /s where / = /(s) is the contact force required for deflection s at the 
center of the string-like vocal ligament in the anatomy shown in Figure 1 (upper 
left panel). It should be emphasised that the exact numerical correspondence 
of tissue parameters to lumped model parameters M and K is intractable (and 
for most practical purposes even irrelevant), and their values in computer sim¬ 
ulations must be tuned using measurement data of fo and the measured form 
of the glottal pulse [29[. 


2.3 Forces during the closed phase 

During the glottal closed phase (i.e., when AWi{t) < 0 at the narrow end of 
the vocal folds), there are no proper aerodynamic forces affecting the vocal 
folds dynamics in equations (1). There are, however, nonlinear spring forces 
with parameter kn, accounting for the elastic collision of the vocal folds. They 
are accompanied by the resultant acoustic counter pressure from the VT and 
subglottal cavities, denoted by Pc = Pc(t) in equation (13) below. Thus, the 
force pair for equations (1) during glottal closed phase is given by 


F = Fh = 


kH\AWi\^/^ - Cp,p, 
CpcPc 


for AVFi < 0. 


(4) 
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Here, the coupling coefficient Cpc = Cpc(t) accounts for the moment arms and 
areas on which pc acts, and it will be given an expression in equation (14). 

This approach is related to the Hertz impact model that has been used 
similarly in [34] and [43]. During the glottal open phase (i.e., when AWi{t) > 
0), the spring force in equations (4) is not enabled. Then the load terms in 
equation (1) are given by F(t) = FA{t) as introduced below in equation (8) in 
terms of the aerodynamic forces from the glottal flow. 

3 Glottal Flow and the Aerodynamic Force 

The air flow within the glottis is assumed to be incompressible and one¬ 
dimensional with velocity V{x^t), satisfying the mass conservation law 
H{x,t)V{x,t) = HiVo{t), where H{x,t) is the height of the flow channel in¬ 
side the glottis. In the model geometry of Figure 1 (right panel) we have 

H{x,t) = AW 2 {t) + y(ATTi(t) - AW 2 {t)), x e [0,L]. 

Jj 

The velocity Vo{t) of the flow through the control area hFli superior to the 
glottis is described by 

io{t) = ^ hTF ~ Rg{t)'Vo{t)) (5) 

where Psub is an ideal pressure source (whose values are given relative to the 
ambient air pressure) located immediately inferior to the vocal folds, Ciner reg¬ 
ulates flow inertia, h is the width of the rectangular flow channel, and Rg{t) 
represents the total pressure loss in the glottis. In fact, equation (5) is related 
to Newton’s second law for the air column in motion. Although the pressure 
driving phonation originates at the lungs, it is here assumed that physiological 
mechanisms enable the adequate control of the pressure source Psub- Note that 
due to assumed incompressibility, equation (5) can equivalently be written in 
terms of position-independent volume flow through the glottis U{t) = Vo{t)hHi. 

To derive equation (5) following [33, Section 2.2], one begins with the pres¬ 
sure loss balance Psub = Pg + Pa where the Psub is the sum of the glottal 
pressure loss and the accelerating pressure of the fluid column mass in the 
airways and lungs. The power of accelerating or decelerating an (incompress¬ 
ible) fluid column is Pa(t){hF{i)vo{t). This power is equal to the derivative of 
the kinetic energy of the fluid column, yielding the identity pa{t){hF{i)vo{t) = 
pvo{t)vQ{t){hHi)‘^ J where the integration is extended over the VT and 

SGT volume. Here A{r) denotes the area of the fluid column cross-section that 
contains the position vector r, and the incompressibility A{r)v{f,t) = hF[iVo{t) 
was used. Equation (5) follows from this by denoting Ciner = pf 
contribution of the VT to the total inertance can be further integrated to 
= P lo''^ inertance of the subglottal masses cannot be ex¬ 

pressed similarly in terms of anatomic data. Hence, the parameter Ciner h^-s to 
be used as a tuning parameter. 

The total pressure loss in the glottis in equation (5) consists of two compo- 





nents, namely 


Rg{t) = Rv{t) + Rt{t), where 


Rv{t) 


l2fJ,HlLg 


and 


Rt{t) = kg 


pH^Vojt) 

2AWi{t)'^' 


( 6 ) 


The first term Rv{t) represents the viscous pressure loss, and it is motivated by 
the Hagen-Poiseuille law in a narrow aperture. It approximates the pressure 
loss in the glottis using a rectangular tube of width h, height AWi, and length 
Lg. The parameter p, is the kinematic viscosity of air. The second term Rt{t) 
takes into account the pressure losses not attributable to viscosity in the sense 
of Ry, and its form is motivated by the experimental work in [44]. The coef¬ 
ficient kg represents the difference between energy loss at the glottal inlet and 
pressure recovery at the outlet. This coefficient depends not only on the glottal 
geometry but also on the glottal opening, subglottal pressure, and flow through 
the glottis [45]. It should be noted that equations (5)-(6) bear resemblance to 
the description of airflow in [14, 15] where the pressure loss and recovery effects, 
however, are accounted for by flow separation in a diverging channel. 

The pressure p = p{x, t) in the glottis is given in terms of vq from equation (5) 
by making use of the mass conservation and the Bernoulli theorem p{x, t) + 
^pV{x,t)'^ = Psub for static flow. Since each vocal fold has two degrees of 
freedom, the pressure p in the glottis and the VT/SGT counter pressure Pc can 
be reduced to an aerodynamic force pair F 4 = [Fa,i Fa, 2 ] where F^.i affects 
at the right (i.e., the superior) end of the glottis (x = L) and Fa ,2 the left (i.e., 
the inferior) end {x = 0) in Figure 1 (lower left panel). This reduction can be 
carried out by using the total force and moment balance equations 


Fa,i + Fa,2 


L ■ Fa,i 


h / (p(x, t) - Psub) dx and 

do 

/ x{p{x,t) - psub)dx - LCpcPc, 
do 


h 

COS^ (f) 


(7) 


where (j) is the angle of the inclined vocal fold surface from the horizontal as 
shown in Figure 1 (right panel), and Cpc = Cpc(t) accounts for the moment arms 
and areas on which pc acts. An expression for Cpc is given in equation (14). The 
force calculations are done using the pressure difference p(x, t) —psub because we 
assume that wij = 0 for all i,j = 1,2 is the equilibrium under subglottal pres¬ 
sure Psub (making the simplifying approximation that this equals the ambient 
hydrostatic pressure in the tissues surrounding the vocal folds), and therefore 
forces Fa,i and Fa ,2 must vanish when p(a;,t) = Psub and pc = 0. Note that 
since the displacements Wi are in the y-direction only, the aerodynamic forces 
have been assumed to act in this direction as well as shown in Figure 1 (lower 
left panel). The moment is evaluated with respect to point {x,y) = (0,0) for 
the lower fold and {x,y) = {0 ,Hq) for the upper fold in Figure 1 (right panel). 
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Evaluation of these integrals yields 

_ phLHlvl ( 1 1 

2cos2<;i AtEi(AlE2-AVEi) (AlEi - AtEa)^ \^Wi)) 

— CpcPc, for ATEi > 0, and 

^ phLHlvg ( sin^ c/fATEa + cos^ (/>AlEi 1 ^ / AlEa \ \ 

“ 2 cos2 (j) lvAtEiAfE2(AtEa - AfEi) ~ (AtEi - AlEa)^ l^AlEi )) 
+ CpcPc, for AfEi > 0. 

( 8 ) 

During the glottal closed phase (i.e., when AfEi(t) < 0), the aerodynamic force 
in equations (8) is not enabled, and the vocal fold load force is instead given by 
equation (4) above. 


4 Vocal Tract and Subglottal Acoustics 


4.1 Modelling VT acoustics by Webster’s equation 

A generalised version of Webster’s horn model resonator is used as acoustic 
loads to represent both the VT and the SGT. It is given by 

1 2naiW{s) dj; _= n ('Q'l 

c 2 E(s )2 dt^ A(s) dt A(s) ds\ dsj ’ ^ ’ 

where c denotes the speed of sound, the parameter Oi > 0 regulates the energy 
dissipation through air/tissue interface, and the solution ip = ip(s,t) is the 
velocity potential of the acoustic field. Then the sound pressure is given by 
p = pipt where p denotes the density of air. The generalised Webster’s model 
for acoustic waveguides has been derived from the wave equation in a tubular 
domain in [46] , its solvability and energy notions have been treated in [47] , and 
the approximation properties in [48] . 

The generalised Webster’s equation (9) is applicable if the VT is approxi¬ 
mated as a curved tube of varying cross-sectional area and length Lyr- The cen¬ 
treline 7 : [ 0 , Lvt] — > of the tube is parametrised using distance s G [ 0 , Lyr] 
from the superior end of the glottis, and it is assumed to be a smooth planar 
curve. At every s, the cross-sectional area of the tube perpendicular to the cen¬ 
treline is given by the area function A(s), and the (hydrodynamic) radius of the 
tube, denoted by R{s) > 0, is defined by A(s) = The curvature of the 

tube is defined as k{s) := ||7"(s)||, and the curvature ratio as ^(s) := R{s)k{s). 
Since the tube does not fold on to itself, we have always r]{s) < 1, and clearly 
77 = 0 if the tube is straight. 

We are now ready to describe the rest of the parameters appearing in equa¬ 
tion (9): They are the stretching factor lE(s) and the sound speed correction 
factor E(s), defined by 


IE(s) := i?(s)v/i?'(s)2 + (r;(s)-l)2, 

E(s) := (l + ir 72 (s))-'/^ 

In the context of VT, we use the following boundary conditions for equation (9): 

f ^{LyT,t) + 9c^{LyT,t) = 0, , . 

I = -civoit). ^ ’ 
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The first boundary condition is imposed at the mouth opening, and the pa¬ 
rameter 6 * > 0 is the normalised acoustic resistance due to exterior space. The 
values for 9 are based on the piston model given in [49, Chapter 7, Eq. (7.4.31)]. 
However, the acoustic impedance of the piston model has a significant reactive 
part as well, and its effect has been investigated by replacing the first equa¬ 
tion in ( 11 ) by another boundary condition that corresponds to the impedance 
Z{s) = The rational impedance of the same form appears also as the 

“first-order high pass model” for termination of an acoustic horn in [50, Sec¬ 
tion 4.1[. In the present article, the nominal values for R and L have been 
obtained by interpolation from the impedance of the piston model as given in 
Table 3 below where also the effects of the reactive component are discussed. 

The latter boundary condition in equation (11) couples the resonator to 
the glottal flow given by equation (5). The scaling parameter ci = hHi/A{0) 
extends the assumption of incompressibility from the control area just right to 
the glottis in Figure 1 (right panel) to the VT area slice nearest to the glottis. 
Using Cl and a VT geometry independent control area, instead of defining Vq 
as the flow through H( 0 ) directly, reduces the sensitivity of the model to the 
accurate placement of the glottis in the VT geometries which can be problematic 
in MRI data. 

4.2 Subglottal tract acoustics 

Anatomically, the SGT consists of the airways below the larynx: trachea, 
bronchi, bronchioles, alveolar ducts, alveolar sacs, and alveoli. This system 
has been modelled either as a tree-like structure [28[ or, more simply, as an 
acoustic horn whose area increases towards the lungs [51, 36[. We take the lat¬ 
ter approach and denote the cross-sectional area and the horn radius by As(s) 
and i?s(s), respectively, where s € [0,Lsgt] and Lsgt is the nominal length of 
the SGT. 

Since the subglottal horn is assumed to be straight, i.e. 77 = 0, we have 
E = 1 and lUs(s) = Rs{s)y^Rg{s)^ + 1. Then equations (9)-(ll) translate to 

{ 1 I ‘i.T^aiWs^s) _ 1 d ( A („\^'\ — n 

' As(s) dt a1s(s) ds ' ds J ’ 

^{LsGT,t) + (^sC^iLsGT,t) = 0 , 

11(0, t) = C2Vo{t), 

where the solution ip is the velocity potential for the SGT acoustics. We now 
use the scaling parameter value C 2 = hHi/As{0). The same kind of losses are 
considered in the SGT as in the VT: a termination loss characterised by nor¬ 
malised acoustic resistance 9s > 0 , and energy dissipation through the air/tissue 
interface along the length of the horn regulated by parameter 02 > 0 . 

4.3 The acoustic counter pressure 

The final part of the vowel model produces the feedback coupling from VT/SGT 
acoustics back to glottal oscillations. This coupling is realised by the product 
of the acoustic counter pressure Pc = Pc{t) and the coupling coefficient Cpc = 
Cpc{t) as already shown in equations (4) and ( 8 ) above. 
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Figure 2: Left: The VT intersections extracted from the test subject during 
phonation of [a] and [i]. Right: The resulting area functions for equation (9), 
represented as a function of distance from the glottis. 


The counter pressure is the resultant of sub- and supraglottal pressure com¬ 
ponents, and it is given in terms of velocity potentials from equations (9) and 
( 12 ) by 

Pc(t) = QpcP t) - C3V't(0, f)) . (13) 

The tuning parameter Qpc G [0,1] enables scaling the magnitude of the 
feedback from the VT and SGT resonators to the vocal folds. The parameter 
Qpc is necessary because it is difficult to estimate from anatomic data the area 
on which the counter pressure Pc acts. In simulations, excessive acoustic load 
forces lead to non-stationary, even chaotic vibrations of the vocal folds. 

The second parameter C 3 > 0 in equation (13) accounts for the differences in 
the areas and moment arms for the supra- and subglottal pressures that load the 
equations of motion equations (1) for vocal folds. Based on the idealised vocal 
folds geometry in Figure 1 (right panel), we obtain an overly high nominal value 
C3 = 8.6. In the simulations of this article, we use Qpc as a tuning parameter 
to obtain the desired glottal pulse waveform, and the value of C 3 is kept fixed 
(one could say, arbitrarily) at C 3 = 1 (if the subglottal resonator is coupled) or 
C 3 = 0 (if the subglottal acoustics is ignored). If it is necessary for producing 
a realistic balance between supra- and subglottal feedbacks, the value of C 3 can 
be increased without losing stable phonation up to QpcCs « 0 . 6 . 

The coupling coefficient Cpc is best understood in reference to the moment 
balance in equation (7), although it appears in the same way in both equations 
(4) and ( 8 ). The counter pressure Pc is assumed to affect only in the longitudinal 
direction (i.e., vertically in right panel of Figure 1). For each vocal fold, pc 
acts on an area of h ^nd produces a moment arm of ^go-g^i-AWi 

around points {x,y) = ( 0 , 0 ) and {x,y) = (0,i7o) for the lower and upper folds, 
respectively. Hence 


^ ^ (gj - AW^){2Ho - iJi - AIVi) 

pc ■ I J 

5 Anatomic Data and Model Parameters 

5.1 Area functions for VT and SGT 

Solving Webster’s equation requires that the VT is represented with an area 
function and a centreline, from which curvature information can be computed. 
Two different VT geometries corresponding to vowels from a healthy 26 years 
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old female are used: A prolonged [a] produced at fundamental frequency fo = 
168 Hz and similarly produced [i] at fo = 210 Hz. These geometries have 
been obtained by Magnetic Resonance Imaging (MRI) using the experimental 
setting that has been described in [5]; see also [52, 53, 54] for earlier approaches. 
The extraction of the computational geometry from raw MRI data has been 
carried out by the custom software described in [55, 56[. The VT geometries and 
the area functions are shown in Figure 2, and related VT geometry dependent 
parameter values are given in Table 1. 

The piston model [49, Chapter 7[ gives the expression 9 = 2Tr A{Lvt)I for 
the normalised acoustic resistance in equation (11) where we use the nominal 
wavelength £ = 171.5 mm, corresponding to the centre frequency 2 kHz of the 
voice band. The values for 6*[a[, 0[i[ together with the resonances /i?j[a[, fRj{i] 
for j = 1,2 are given in Table 1. These values of the purely resistive load 
correspond to an infinitely long, non-resonant waveguide placed in front of the 
mouth, diameter of which is 95.2 mm for [a[ and 96.6 mm for [i[. 

Table 1: Physical and physiological parameters dependent on the VT geometry. 


Parameter 

[a] 

[i] 

normalised acoustic resistance at mouth, 6 

0.064 

0.014 

area at mouth 

299 mm^ 

66 mm^ 

VT inertia parameter, 

2540 kg/m^ 

2820 kg/m^ 

length of VT, Lyr 

132 mm 

136 mm 

1st VT resonance, fm, from equations (9)“(11) 

749 Hz 

199 Hz 

2nd VT resonance, /i^ 2 , from equations (9)"(11) 

2084 Hz 

2798 Hz 


The MRI data that is used for the VT does not cover all of the SGT. For 
this reason, an exponential horn is used as the subglottal area function for 
equation (12) 

A,(s) = A,(0)e- where e = i In () (15) 

following [51[. The values for As(0) = 2cm^ and As{Lsgt) = 10cm^ are taken 
from Figure 1 in [51[. The horn length Lrgt is tuned so that the lowest sub- 
glottal resonance is = 500 Hz which results in the second lowest resonance 
at /|j 2 = 1000 Hz. This is a reasonable value for fjn based on Table 1 in [11[; 
see also [57, 58[, [43[ and Figure 1 in [28[. The SGT lung termination resistance 
in equation (12) is given the value 6s = 1 which corresponds to an absorbing 
boundary condition. The air column in this SGT model has a nominal inertia 
parameter value of 1040 kg/m'^ which is taken as a guideline for tuning the total 
inertia of the airways to obtain desirable flow pulse waveforms. For the simu¬ 
lations in this article, Ciner = where refers to the VT inertia 

parameters given in Table 1. 

5.2 Static parameter values 

Table 2 lists the numerical values of physiological and physical constants used 
in all simulations. Note that the vocal fold springs are, for this study, placed 

In fact, she is one of test subjects in the experimental companion article [3]. 
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symmetrically about the midpoint of the vocal folds. Based on the acoustic 
reflection and transmission coefficients at the air/tissue interface, the common 
value of the energy loss coefficients oi and 02 in equations (9) and (12), respec¬ 
tively, is taken as 

OL\= ai= ^ = 7.6 • 10“^ —. (16) 

PhCh ni 

All the model parameter values introduced so far are assumed to be equally 
valid for both female and male phonation, except for vocal fold length h. As 
we are treating female phonation in this article, it remains to describe the pa¬ 
rameter values for equations (1) where the differences between female and male 
phonation are most significant. Horacek et al. provide parameter values for M 
and K for in male phonation [34, 42] but similar data for female subjects cannot 
be found in literature. Instead, the masses in M are calculated by combining 
the vocal fold shape function used in [34] with female vocal fold length reported 
in [59]. A first estimate for the spring coefficients in K is calculated by assuming 
that the first eigenfrequency of the vocal folds matches the starting frequency 
for the simulations. The spring coefficients are then adjusted until simulations 
produce the desired starting fundamental frequency for the /o-glides, giving the 
constant AT® for equations (17)--(18). For details of these rather long calcula¬ 
tions, see [33] and [29]. 

Table 2: Physical and physiological constants. 


Parameter 

Symbol 

Value 

speed of sound in air 

c 

343 m/s 

density of air 

p 

1.2 kg/m^ 

kinematic viscosity of air 

p 

18.27 fiN s/m2 

vocal fold tissue density 

Ph 

1020 kg/m3 

VT loss coefficient 

ai 

7.6 -10“^ s/m 

SG loss coefficient 

02 

7.6 -10“^ s/m 

spring constant in contact (from [34]) 

kn 

730 N/m 

glottal gap at rest 

9 

0.3 mm 

vocal fold length (from [59]) 

h 

10 mm 

vocal fold thickness (from [34]) 

L 

6.8 mm 

superior vocal fold spring location (from [33]) 

h 

0.85 

inferior vocal fold spring location (from [33]) 

h 

0.15 

control area height below glottis 

Ho 

11.3 mm 

control area height above glottis 

Hi 

2 mm 

equivalent gap length for viscous loss in glottis 

Lg 

1.5 mm 

SGT length 

Lsgt 

350 mm 

normalised acoustic resistance at lungs 

Os 

1 

glottal entrance/exit coefficient 

kg 

0.2 

subglottal (lung) pressure over the ambient 

Psub 

650 Pa 


Let us conclude with a sanity check on the parameter magnitudes for equa¬ 
tion (1) describing the vocal folds. The total vibrating mass for female phonation 
is 7711-1-7712-1-7713 = 0.27 g and the total spring coefficients are ki + k 2 = 216 N/m. 
These nominal values yield fo « 150 Hz for female phonation. If the charac¬ 
teristic thickness of the vocal folds is assumed to be about 5 mm, these param- 
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eters yield a magnitude estimate for the elastic modulus of the vocal folds by 
E « • 5 • 10“^ m « 15.9 kPa. This should be compared to Figure 7 in [60] 

where estimates are given for the elastic modulus of ex vivo male vocal folds 
and values between 2.0 kPa and 7.5 kPa are proposed for different parts of the 
vocal fold tissue. 


6 Computational Aspects 

6.1 Parameter control for obtaining vowel glides 

The /o-glide is simulated by controlling two parameter values dynamically. 
First, the matrix K is scaled while keeping the matrix M constant. This ap¬ 
proach is based on the assumption that the vibrating mass of vocal folds is not 
significantly reduced when the speaker’s pitch increases; a reasonable assump¬ 
tion as far as register changes are excluded. It should be noted that the rela¬ 
tive magnitudes of M and K essentially determine the resonance frequencies of 
model (1). However, attention must be paid to their absolute magnitudes using, 
e.g., dimensional analysis since otherwise the load terms Fj{t) in equation (1) 
(containing the aerodynamic forces, contact force between the vocal folds dur¬ 
ing the glottal closed phase, and the counter pressure from the VT/SGT) would 
scale in an unrealistic manner. 

The subglottal pressure, Psub, is the second parameter used to control the 
glide production. The dependence of the fundamental frequency on Psub has 
been observed in simulations [10, 61[, physical experiments using upscaled repli¬ 
cas [14[, as well as in humans [62[ and excised canine larynges [63[. The impact 
of Psub on fo is, however, secondary in these glides (/o trajectories are within a 
few Hz). Instead, Psub is scaled in order to maintain phonation and to prevent 
large changes in phonation type as the stiffness of the vocal folds changes. The 
scaling parameter value of 2 was found by trial and error to maintain the glottal 
open quotient OQ (proportion of glottal cycle during which the glottis is open), 
the closing quotient CIQ (proportion of the glottal cycle during which the flow 
is decreasing), and the maximum of AWi approximately steady over the upward 
glide when acoustic feedback was disabled. 

The parameters are scaled exponentially with time 


Kit) = 2.2^*/^ K°, 

Psub{t) — 2 ^ Psub 

(17) 

for rising glides, and 



K{t) = 2.22-2*/^iF°, 

Psub{t) = 

(18) 


for falling glides. The duration of the glide is T = 3 s, and t is the time from 
the beginning of the glide. Note that the temporal scale of the glides is long 
compared to glottal cycles, and hence the control parameters K and Psub can 
be regarded as static from the point of view of the vocal fold dynamics. Other 
starting conditions (particularly, vocal fold displacements and velocities, and 
pressure and velocity distributions in the resonators) are taken from stabilised 
simulations. These parameters produce glides with approximately in the 
range [150Hz, 320 Hz], although the exact range depends on the VT geometry 
and vocal fold damping as well. 
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Figure 3: Glide for vowel [i] with Qpc = 0.1 and (3 = 0.012 kg/s (red) and the 
same glide without VT and SGT feedback {Qpc = 0) (black). Left: fundamental 
frequency (/o), open quotient {OQ), and closing quotient {CIQ). Right: En¬ 
velopes of volume flow (C/), glottal opening (AVFi), and sound pressure at lips 
(Pm)- The values of fo, OQ, and CIQ have been extracted pulse by pulse from 
the volume flow signal. 


The damping parameters bi for i = 1,2, in equation (2) play an important 
but problematic role in glottis models. If there is too much damping (while 
keeping all other model parameters fixed), sustained oscillations do not occur. 
Gonversely, too low damping will cause instability in simulated vocal fold oscil¬ 
lations. The magnitude of physically realistic damping in vibrating tissues is not 
available, and the present model could possibly fail to give a quasi-stationary 
glottis signal even if realistic experimental damping values were used. With 
some parameter settings, the model even produces quasi-stationary signal at 
several damping levels. For simplicity, we set bi = (3 > 0 for i = 1,2, and use 
golden section search to And at least one value of vocal fold loss (3 that results 
in stable, sustained oscillation. The damping remains always so small that its 
lowering effect on the resonances of the mass-spring system (1) is negligible. 

6.2 Numerical realisation 

The model equations are solved numerically using MATLAB software and 
custom-made code. The vocal fold equations of motion (1) are solved by the 
fourth order Runge-Kutta time discretisation scheme. The flow equation (5) is 
solved by the backward Euler method. The VT and SGT are discretised by the 
FEM using piecewise linear elements {N = 29 for VT and V = 10 for SGT) 
and the physical energy norm of Webster’s equation. Energy preserving Grank- 
Nicolson time discretisation (i.e., Tustin’s method [64]) is used. The time step 
is almost always 10 /is which is small enough to keep the frequency warping 
in Tustin’s method under one semitone for frequencies under 13kHz. Reduced 
time step, however, is used near glottal closure. This is due to the discontinuity 
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Figure 4: Glide for vowel [a] with Qpc = 0.1 and /3 = 0.012 kg/s (red) and the 
same glide without VT and SGT feedback {Qpc = 0) (black). Left: fundamental 
frequency (/o), open quotient (OQ), and closing quotient (CIQ). Right: En¬ 
velopes of volume flow (U), glottal opening (AVFi), and sound pressure at lips 
{Pm)- foi OQ, and CIQ have been extracted pulse by pulse from the volume 
flow signal. 


in the aerodynamic force in equation (8) at the closure which requires numerical 
treatment by interpolation and time step reduction as explained in Section 2.4.1 
of [33]. 

Solving the equations of motion of the vocal folds is the computationally 
most expensive part of the model, taking approximately 55% of the running 
time in simulations of steady phonation with given parameter values. In com¬ 
parison, solving the Webster’s equations with precomputed mass, stiffness, and 
loss matrices takes approximately 10% of the simulation time, and the flow 
equation solver less than 2%. 

7 Simulation Results 

The results of upward glide simulations for vowels [a, i] are shown in Figures 3- 
4. The fundamental frequency fo trajectory as well as glottal open quotient OQ 
and closing quotient CIQ have been extracted from the glottal volume flow U 
signal pulse by pulse in all figures. Envelopes of U, glottal opening AWi, and 
pressure signal at lips Pm are also displayed. 

The simulations indicate a consistent locking pattern at fm [z] in fo trajec¬ 
tories that vanishes if the VT feedback is decoupled by setting Qpc = 0. The 
locking pattern in rising glides follows the representation given in Figure 6 (right 
panel): sudden jump upwards to fm, a locking to a plateau level, and a smooth 
release. Such locking behaviour is not observed for glides of [a] where /ki[ci] 
is not inside the simulated frequency range [150 Hz, 320Hz]. The vocal tract 
resonance fractions //ji[a[/4 = 187Hz and /iji[a[/3 = 250Hz, are within the 
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Figure 5: Upward (black) and downward (red) glides for vowel [i] with Qpc = 
0.04 and /3 = 0.012 kg/s. Left: fundamental frequency (fo), open quotient 
(OQ), and closing quotient (CIQ). Right: Envelopes of volume flow ([/), glottal 
opening (AWi), and sound pressure at lips {pm)- On the x-axis, relative vocal 
fold stiffness refers to the coefficient of the matrix in equations (17) and 
(18). 


frequency range, and the corresponding events are visible in the sound pressure 
signal at the lips; see Figure 4. They do not, however, cause noticeable changes 
in the fo trajectory of the glottal flow. 

The frequency jump at fjn [z] in the simulations is preceded by a decrease in 
vocal fold oscillation and glottal flow amplitudes. This is accompanied by the 
disappearance of full glottal closure {OQ = 1) and less sharp decrease in glottal 
flow during closure (higher CIQ), both of which indicate increased breathiness 
of the phonation. The locking plateau coincides with a nearly constant rate of 
decreasing OQ and CIQ, and after the release of the parameters return to 
the feedback free trajectories. 

Keeping Qpc and other model parameters the same, a falling fo glide shows 
a significantly more pronounced or longer locking at fjn compared to rising 
glides; see Figure 5. Note that in Figure 5 the x-axis is the relative vocal fold 
stiffness, which for rising glides is 2*/^ and for falling glides 2^“*/^ as given 
in equations (17) and (18). The fluctuations in /„ in the falling glides around 
the "corner" of the lock and at frequencies below this are qualitatively similar 
to what occurs at extreme values of Qpc and /3 for rising glides. In contrast, 
fluctuations in the lip pressure envelope occur temporally after the release of 
the locking in both rising and falling glides. 

Finally, the effect of model parameters (3 and Qpc on the glide simulations at 
//ji[i] is considered. These observations are qualitatively described in Figure 6. 
In the right panel, the medium values for (3 refer to the interval [0.01,0.02] 
and for Qpc to the interval [0.05,0.1]. These intervals can thus be regarded as 
feasible parameter ranges for vowel glide simulations of [i]. 

Referring to Figure 6 (right panel), the full frequency range [150 Hz, 320 Hz] 
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Figure 6: Left: trajectories for [i] fixed Qpc = 0.1 and different values of /3: 

black 0.005 kg/s, red 0.01 kg/s, green 0.015 kg/s, and blue 0.03 kg/s. Middle: fo 
trajectories for [i] with /3 = 0.012 kg/s and different values of Qpc- black 0.0, red 
0.05, green 0.1, and blue 0.25. Right: /<, trajectories for [i] qualitatively as Qpc 
and /3 increase in the direction of the arrow. Light gray background indicates 
that small parameter changes can lead to loss of quasi-stable glides. 


for fo can be obtained with modal locking as shown in Figure 3 if both Qpc and 
/3 have medium values or if one is high and the other low. If both parameters 
are high or one is high and other medium, the simulated fo range is reduced to 
above 200 Hz which is the value of /fli[i]. This glide starting frequency cannot 
be lowered by changing K, and it appears to represent very strong modal locking 
at the onset of the vowel glide simulation. 

The stability of glide simulations (understood as slowly changing amplitude 
envelope of glottal volume flow U) becomes a serious issue at low and high 
values of /3. We have tuned the subglottal pressure Psub in glide simulations 
as given in equations (17)-(18). If Psub were instead kept constant, we would 
observe an increasing OQ and decreasing amplitudes of glottal flow and vocal 
fold oscillations throughout the glide but the qualitative behaviour of modal 
locking events, including the behaviour of phonation type parameters around 
these events, remains very similar. 


8 Sensitivity and Robustness 

Parameter tuning of the vowel model is tricky business as can be seen from 
model parameter optimisation experiments described in Chapter 4 in [29]. By 
exaggerating some of the parameter values, it is possible to make vowel glide 
simulations over fm [i] behave in a way that can be excluded by experiments or 
observations from natural speech. 

In phonetically relevant simulations, various tuning parameters must be kept 
in values that are not only physically reasonable but also do not produce ob¬ 
viously counterfactual predictions. When such a realistic operating point has 
been found, it remains to make sure that the simulations give consistent and 
robust results near it. In doing so, we also check which parts of the full model 
are truly significant for the model behaviour reported above. 

8.1 Acoustics of the vocal tract by Webster’s equation 

The constants oi and 02 in respective equations (9) and (12) regulate the bound¬ 
ary dissipation at the air/tissue interface. As shown in Section 3 in [46], the 
same parameter appears in the corresponding dissipating boundary condition 
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Figure 7: Left and middle: The frequency responses of the VT acoustic loads 
for [a] and [i], computed from Webster’s equation (9). The black curves are the 
responses using the purely resistive load of equation (11) at the mouth where the 
parameter values are given in Table 1. The responses using the RL impedance 
model and its nominal parameter values in Table 3 are shown in red. Right: 
Ringing at a modal locking event of [i] shown for both types of loads at the 
mouth: purely resistive load (black) and RL model load using nominal and 
tuned parameter values (red and green, respectively). The parameter values for 
the RL impedance model are given in Table 3. 


a(j)t + = 0 for the wave equation (f)tt = c^Acj) where (j) is the 3D acoustic 

velocity potential and D denotes the exterior normal of the VT/air boundary. 
The qualitative effect of physically realistic tissue losses to vowel glide simula¬ 
tions was observed to be insignificant; see also Section 5 in [26]. However, these 
losses move slightly the VT resonance positions computed from equations (9). 

On the other hand, the VT resonances are quite sensitive to the normalised 
acoustic resistance 6 in equation (11). This parameter regulates the energy loss 
through mouth to the external acoustic space, and its extreme values 0 and oo 
correspond to open and closed ends for idealised acoustic waveguides, respec¬ 
tively. Again, physically realistic variation in 0 does not change the qualitative 
behaviour of vowel glides near /i^i[i] as reported above. 

To consider the effect of the reactive acoustic loading at the mouth opening, a 
first order impedance model was used, based on a parallel coupling of a resistive 
load R and an inductive load L. The nominal values for R and L in Table 3 
were obtained by interpolation at 200 Hz from the piston model given in [49, 
Chapter 7, Eq. (7.4.31)]. The transfer function Z{s) = approximates the 

irrational piston model impedance very well for frequencies under 2 kHz, and 
the frequency responses in Figure 7 (left and middle panels) are reasonable as 
well. 

Table 3: Values for the parameters of the RL impedance model and its transfer 
function. 


Parameter 

[a] 

[i] 

Nominal value of R 

1.98-106^ 

8.96-106^ 

Nominal value of L 

33.2^ 

70.6^ 

Tuned value of R 

(Not required.) 

8.96-104^ 

Nominal value of Z(4007rz) 

879-k 4.17 • lOS 

879 -k 8.87 • lO^z 


However, the value lims_>_|_oo Z{s) = R overestimates its piston model coun¬ 
terpart by over 700 %, and the vowel simulations show excessive ringing, e.g., at 
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modal locking events as shown in Figure 7 (right panel). In a low-order rational 
model, all of the lumped inductance appears at the mouth opening whereas the 
inductance is distributed by resistive shunting and transmission delays to an 
infinite volume in the piston model. Hence, the value of R must be tuned down 
from its nominal value so as not to contradict experimental evidence. 

Another low-order time-domain model for termination, based on an idealised 
spherical interface at a horn opening, is proposed in [50, Eq. (39)]. In its most 
general form, the model is an integro-differential delay equation with nine pa¬ 
rameters and a single delay lag. Unfortunately, the general form cannot be 
introduced to Webster’s model as a boundary condition: this is the salient fea¬ 
ture of the parallel RL model (having the same circuit topology as the first-order 
high pass model [50, Eq. (28)[) that simplifies the implementation of the FEM 
solver. 

The role of the VT curvature in equation (9) is involved, too. As can be 
seen from equation (10), the curvature results in a second order correction in 
the curvature ratio rj to the speed of sound c in equation (9). In waveguides 
of significant intersectional diameter compared to wavelengths of interest, the 
contribution of r] in equation (10) appears to be secondary to a larger error 
source that is related to curvature as well. This is caused by the fact that 
a longitudinal acoustic wavefront does not propagate in the direction of the 
geometric centreline of a curved waveguide even if the waveguide were of cir¬ 
cular intersection with constant diameter. The wavefront has a tendency to 
“cut the corners” in a frequency and geometry dependent manner, and we do 
not have a mathematically satisfying description of the “acoustically correct” 
centreline that would deal with this phenomenon optimally in the context of 
Webster’s equation. Extraction of the area function A(-) for equation (9) from 
MR images, however, requires some notion of a centreline, and using a different 
centreline would lead to slightly different version of A(-). This would somewhat 
change, e.g., the resonance frequencies of equation (9) but not the mathemati¬ 
cal structure of the model nor the results of vowel glide simulations. Hence, we 
simply use the area functions and centrelines obtained from 3D MR images by 
the custom code described in [56[ using its nominal settings. 

It remains to consider the non-longitudinal resonances of the VT. By its 
construction, the generalised Webster’s equation does not take into account at 
all the transversal acoustic dynamics of the VT. It is known from numerical 
3D Helmholtz resonance experiments on several dozens of VT geometries that 
lowest non-longitudinal resonances of the human VT tract are at approximately 
4 kHz corresponding to A/2 « 4 cm; see, e.g., [5[ and [55[. Anatomically, such 
length may appear between opposing valleculae, piriform fossae, or even across 
the mouth cavity in some VT vowel configurations. However, the upper limit 
of 4 kHz for Webster’s equation is adequate for the computation of the acous¬ 
tic counter pressure Pc in equation (13) for several octaves lower fundamental 
frequencies fo € [150 Hz, 320 Hz] that are used in vowel glide simulations. 

It should be pointed out that equations (9)-(10) with nonvanishing rj is the “right” gen¬ 
eralisation of Webster’s horn model, corresponding to the wave equation in curved acoustic 
waveguides. This approach results in the approximation error analysis given in [48]. Some¬ 
what paradoxically, a similar error analysis for the simpler model equations (9)— (10) with 
?7 = 0 would require more complicated error estimation. 
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Figure 8: Left: Volume flow ([/), glottal area (Ag = AWih), supraglottal pres¬ 
sure (psp) just superior to the vocal folds, counter pressure (pc), and subglottal 
pressure just inferior to the vocal folds (psb) without SGT (black line) and with 
SGT (red line) for the vowel [i] at /<, = 180 Hz. Glottal closure are indicated 
by squares and openings by circles. Right: Similar signals for the vowel [a]. 


8.2 Subglottal acoustics 

To large extent, what was stated above about the modelling error of the VT 
acoustics applies to the SGT acoustics as well. We complement this treatment 
by considering how and to what extent subglottal acoustics plays a role in the 
vowel glide simulations reported above 

Even though the acoustic termination at lungs is strongly resistive (see [65]), 
significant ringing (i.e., oscillatory response to an abrupt change of a forcing) 
takes place in the subglottal space. In simulations, it is seen in the bottom 
panels of Figure 8. Moreover, it has been verified by in vivo measurements 
[57, 58, 66, 12[, using physical models [11, 67[, and by mathematically mod¬ 
elling the subglottal acoustics [28[ based on anatomic data of trachea and the 
progressively subdividing system of bronchi and the alveoles [68, 69[. A refined 
model for subglottal acoustic impedance was developed in [65[ for the branching 
airway network in terms of transmission line theory, taking into consideration 
the contribution from yielding walls due to material parameters of cartilages 
and soft tissues. 

During the open phase, the inertia of the air column from bronchi up to 
mouth opening is taken into account by Ciner in equation (5). At closure, the 
flow velocity Vo drops to zero, and a rarefaction pulse is formed above the vocal 
folds due to air column inertia in the VT, and this is part of the acoustics 
modelled by equation (9). Similarly, a compression pulse is formed below the 
vocal folds, known as the “water hammer” in [70[. The subglottal resonator 
equation (12) is mainly excited by the water hammer. Both of these pulses can 
be seen in the supra- and subglottal pressure signals Psp and psb in Figure 8. 

The water hammer is the most important component of subglottal ringing, 
accompanied by its first echo that arrives back to vocal folds after approximately 
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2 ms delay. The delay corresponds to the lowest subglottal formant between 
500 Hz and 600 Hz as reported in [58] and [28]. The first echo returns during 
the glottal closure at least if fo < 150 Hz and the open quotient (OQ) of the 
pulse does not exceed 50 %; see Figure 4 in [57] for measurements and Figure 12 
in [28] for simulations. The echoes of the water hammer pulse can be clearly 
seen in Figure 8 as well but now the first echo returns after the glottis has 
opened again due to higher values of fo and OQ in these simulations. 

The observations from simulations indicate that the subglottal acoustics has 
an observable effect on glottal pulse waveform. The subglottal effect will get 
more pronounced when fo —>■ f'j^ = 500 Hz which is the predefined frequency 
of the first subglottal resonance. This can be understood in terms of the supra- 
glottal behaviour shown in Figure 8 since both the VT and the SGT resonators 
have been realised similarly within the full model. The sensitivity of the fo 
trajectory in the range [150 Hz, 320 Hz] for the subglottal effect depends on the 
magnitude of the SGT component of the counter pressure, regulated by the pa¬ 
rameter C3 in equation (13). Gonsidering the model behaviour at supraglottal 
resonance fractions of /i?i[a[, it is to be expected that the first subglottal res¬ 
onance fraction f'jiil2 should show up similarly. This, indeed, happens if the 
coupling constant C3 in equation (13) is large. 

8.3 Flow model 

The glottal flow described by equations (5)-(6) contains terms representing the 
effect of viscosity in the glottis as well as pressure loss and recovery at the 
entrance and exit of the glottis, respectively. Viscous pressure loss can easily be 
seen to be significant by considering the glottal dimensions and viscosity of air 
in the first term of equation (6). It is clear from this equation that the viscous 
losses dominate at least if the glottal opening is small. 

The importance of entrance and exit effects during parts of the glottal open 
phase can be seen, for example, by comparing simulated volume velocities and 
glottal opening areas with the experimental curves in Figure 3 in [44], obtained 
from a physical model of the glottis. In model simulations, leaving out this 
transglottal pressure loss term changes the glottal pulse waveform significantly 
if other model parameters are kept the same, as shown in Figure 3.7 in [29] . 
About half of the total pressure loss in simulations is due to entrance and exit 
effects at the peak of opening of the glottis; see Figure 3.6 in [29] . However, 
the behaviour of the simulated fo trajectories over //ji[i[ does not change if the 
transglottal pressure loss term is removed. Then, however, the vowel glide must 
be produced by different model parameter values. 


9 Discussion 

We have reported observations on the locking of simulated fo glides on a res¬ 
onance of the VT. The locking behaviour shows a consistent time-dependent 
behaviour that is similar for rising and falling glides. The fo jump at the be¬ 
ginning of the locking in rising glides and end of the locking in falling glides 
occurs together with and increased breathiness of phonation as characterised by 
open quotient OQ and closing quotient CIQ. During the locking plateau, these 
parameters indicated an approximately steady change of phonation type. 
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The locking takes place only at frequencies determined by sub- or supra- 
glottal resonances. Use of psut as a secondary control parameter for the glides 
ensure that the main cause for changes in OQ and CIQ is the acoustic loading. 
By modifying the strength of the acoustic feedback (i.e., the parameter Qpc in 
equation (13)) and vocal fold tissue losses (i.e., the parameter f3), the locking 
tendency at fm [i] may be modulated from non-existent (where both Qpc and /3 
have low values) to extreme locking at /i?i[i] without release (where Qpc and/or 
/3 have exaggeratedly large values); see Figure 6. By decoupling secondary com¬ 
ponents from the simulation model, the locking behaviour at fm [i] remains the 
same, even though the model parameter values required for the desired glot¬ 
tal waveform change. We conclude that the simulation results on vowel glides 
reported above reflect the model behaviour in a consistent and robust manner. 

Vowel glides observed in test subjects are another matter. Any model is a 
simplification of reality, and there is a catch in assessing the role of unmodelled 
physics: a proper treatment would require the modelling of it. Short of this, we 
discuss these aspects based on literature, model experiments, and reasoning by 
analogy. 

9.1 Acoustics 

Viscosity of air has not been taken into account in the acoustics model though 
a measurable effect is likely take place in narrow parts of the VT or SGT. 
Resulting attenuation can be treated by adding a dissipation term of Kelvin- 
Voigt type to equations (9) and (12). For a constant diameter waveguide, the 
term is proportional to pifsat/f?- Adding viscosity losses will widen and lower 
the resonance peaks of Webster’s resonators (i.e., lower their Q-value), with 
a slight change in the centre frequencies. An analogous effect can be studied 
by increasing the tissue dissipation parameter ai in equation (9) (or a 2 in 
equation (12)) to a very high value which has been observed not to change the 
conclusions on vowel glide simulations. Similarly, the overall acoustic resistance 
of the VT has no qualitative effect which can be seen in [71] where the modal 
locking was observed for vowel [oe] at the lowest resonance 647 Hz despite the 
fact that the anatomic geometry of [oe] has a much wider flow channel than that 
of [i[. 

The SGT modelling by the horn is a crude simplification of the fractal-like 
lower airways and lungs. The network structure of the subglottal model in [28] 
could be replicated by interconnecting a large number of Webster’s resonators, 
each modelled by equation (12). The resulting transmission graph is a passive 
dynamical system by Section 5 in [72], but it is not clear how to write an efficient 
FEM solver for such configurations. 

The model proposed in [28] as well as the transmission graph approach are 
likely to produce the correct resonance distribution and frequency-dependent 
energy dissipation rate at the lung end without tuning. The horn model does 
require tuning of the horn opening area and the boundary condition on it in 
order to get realistic behaviour on the lowest subglottal resonance f'm = 500 Hz. 
Doing so freezes all the higher subglottal resonances at fixed positions, e.g., 
//j .2 = 1 kHz. The branching subglottal models given in Figure 8 in [28] have the 
second subglottal resonance between 1.3 kHz and 1.5 kHz. It was observed in [65] 
that the soft tissues introduce an additional resonance to the subglottal system 
that is lower than the first subglottal formant F[ due to air column dynamics. 
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The is no obvious way how a horn model could be used to accommodate such a 
resonance at « 350 Hz due to the yielding wall dynamics. 

Based on the observations on the simulated vowel glides, it seems convincing 
that the overall subglottal effect on the fundamental frequency fo is insignificant 
for vowel glides within [150Hz, 320Hz] that is over 100Hz away from How¬ 
ever, the subglottal effect is certainly discernible in waveforms as in Figure 8, 
but the effect of higher subglottal resonances Irs, ■ ■ ■ cannot be seen even 
there. In current simulations of female phonation, the vocal fold mass-spring 
system has its mechanical resonances at approximately 150 Hz, which acts as 
a low-pass filter for subglottal excitation in higher frequencies. The same con¬ 
clusions are likely to hold when using a more complicated subglottal resonator 
geometry with one caveat: a graph-like subglottal geometry has lots of cross¬ 
mode resonances that affect the subglottal acoustic impedance in other ways 
than just moving the pole positions. 

Also the DC-component of the glottal flow loads the acoustic resonators 
in equations (11) and (12). If we use Vac{t) = vo{t) — ^ Jf_xVo{T) dr with 
T = 2/fo instead of vq as input to the resonator equations, only negligible 
effects are observed in simulated stable waveforms. There are more pronounced 
effects in the beginnings of simulated phonation when a stable waveform has 
not yet developed. 

9.2 Vocal fold geometry and glottal flow 

The idealised vocal folds geometry shown in Figure 1 (right panel) leads to a 
particularly simple expression for the aerodynamic force in equation (8). Re¬ 
placing the sharp peaks by flat tops in Figure 1 (but keeping the same glottal 
gap g at rest) results in phonation that has typically lower open quotient (OQ) 
whereas the original wedge-like geometry produces more often phonation where 
the glottis does not close. This change makes it easier to adjust the parametri- 
sation of the model to obtain some phonation targets. In particular, the glottal 
loss parameter kg can be set closer to more commonly used, somewhat larger 
values since the model geometry becomes more similar to the geometries used 
in related literature. 

Another aspect involving the aerodynamic force on vocal fold structures 
is associated with the hydrostatic pressure reference level in vibrating tissues. 
This level is denoted by Pref, and it is expected to satisfy Pref < Psub- If the air 
pressure between the vocal folds were equal to Pref, then the vocal fold wedges 
would not be accelerated by the pressure difference. In equation (7), we use 
Pref = Psub, and we always have p{x, t) — Psub "£ 0. For this reason, the effect 
of the aerodynamic force is always trying to close the glottis in this case. For 
small flow velocities V{x, t), using p{x, t) —pref with p^ef < Psub in equation (7) 
would give the following outcome: the driving pressure Psub would push the 
vocal folds open more strongly than the aerodynamic force would pull them 
close. Unfortunately, there is no obvious way to determine the true magnitude 
of Pref as it is an outcome of dynamic pressure equalisation processes related to 
Psub and the additional partial pressure due to haemodynamics in tissues. Using 
a tuned value of pref instead of Psub in equation (7) would be desirable, e.g., 
in phonation onset simulations; in particular, if g = 0 where using pref = Psub 
would not start a phonation cycle at all. 

The glottal flow has been studied extensively since 1950’s. Compared to 
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the flow model given above, physiologically more faithful glottal flow solvers 
have been proposed in, e.g., [37], [73] and [74[; see also [51], [10], [75], and 
[44] . As pointed out in [75], more sophisticated flow models are challenging to 
couple to acoustic resonators since the interface between the flow-mechanical 
(in particular, the turbulent) and the acoustic components is no longer clearly 
defined. 

Flow separation and Coanda effect during the diverging phase of the phona- 
tory cycle (which obviously cannot occur in wedge-like geometry of Figure 1) 
have been studied in [74], [76] and [37] using boundary layer theory and phys¬ 
ical model experiments. The boundary layer leaves the vocal fold surfaces at 
the time-dependent flow separation point, say Xs, forming a jet which extends 
downstream into supraglottal space. Thus, the vocal folds “stall” at Xs, and 
the aerodynamic force on them is greatly diminished; see Section IV in [74] 
where the vocal fold model is from [77]. Similarly, the viscous pressure loss 
equation (A7) in [37] depends only on the upstream part of glottis that ends 
at Xs- Simplifying assumptions on the vocal fold geometry [37] are required 
for computing Xs, and the result is sensitive to the geometry which makes it 
challenging to model. 

Turbulence in supraglottal space is a spatially distributed acoustic source, 
and it does not provide a scalar flow velocity signal for boundary control as uq in 
equation (11). The supraglottal jet may even exert an additional aerodynamic 
force to vocal folds that would not be part of the acoustic counter pressure Pc 
from the acoustic resonators. Turbulence in VT constrictions is the primary 
acoustic source for unvoiced fricatives, and many such sources have been mod¬ 
elled separately in, e.g., [51] . Much of the turbulence noise energy lies above 
4 kHz but Webster’s model equation (9) is an accurate description of VT acous¬ 
tics only below 4 kHz due to the lack of cross-modes [78, 79] . This fact speaks 
against the wisdom of including turbulence noise in the proposed model. 

The proposed phonation model treats flow-mechanical and acoustic com¬ 
ponents using separate equations, and we conclude that this paradigm is not 
conducive for including the advanced flow-mechanical features discussed above. 
Instead, phonation models based on Navier-Stokes equations would be a more 
appropriate framework. 


10 Conclusions 

We have presented a model for vowel production, based on (partial) differential 
equations, that consists of submodels for glottal flow, vocal folds oscillations, 
and acoustic responses of the VT and subglottal cavities. The model has been 
originally designed as a tunable glottal pulse source for a high-resolution VT 
acoustics simulator that is based on the 3D wave equation and VT geometries 
obtained by MRI as explained in [5, 55] . The model has found applications as a 
controlled source of synthetic vowels that are needed in, e.g., developing speech 
processing algorithms such as the inverse Altering [6, 7[. 

In this article, the model was used for simulations of rising and falling vowel 
glides of [a, i[ in frequencies that span one octave [150 Hz, 320 Hz]. This interval 
contains the lowest VT resonance fm of [i[ but not that of [a[. Perturbation 
events in simulated vowel glides were observed at VT acoustic resonances, or 
at some of their fractions but nowhere else. The fundamental frequency fo of 
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the simulated vowel was observed to lock to fm [i] but similar locking was not 
seen at any of the resonance fractions. The locking events were accompanied 
by changes in the phonation: increased breathiness below and partially at the 
locking frequency and steady change in breathiness during most of the lock. 
Such modal locking event takes place only when the acoustic feedback from 
VT to vocal folds is present, and then it has a characteristic time-dependent 
behaviour. A large number of simulation experiments were carried out with dif¬ 
ferent parameter settings of the model to verify the robustness and consistency 
of all observations. 

To what extent do the simulation results validate the proposed model? The 
model produces perturbations of the glottal pulse both at VT resonances and 
at some of the VT resonance fractions. Of the former, a wide existing literature 
was reviewed in Introduction. Observations on the subformant perturbations 
in speech have not been reported, to our knowledge, in experimental literature. 
There is a particular temporal pattern of locking in simulated perturbations at 
fmli] as explained in Figure 6 (left panel). Although pulse based fo trajectories 
are rarely shown in literature, a similar pattern can be seen in the speech spec¬ 
trograms given in Figure 5 in [19], and Figure 4 in [18], as well as in vowel glide 
samples in the data set of the companion article [3] . A similar locking behaviour 
can also be seen in simulated spectrograms in Figure 6 in [80] and Figures 13 
and 14 in [16], and it can also be interpreted to lie behind the experimental 
results shown in Figures 10b and 13b of [14]. The glottal flow and area simula¬ 
tions in Figure 8 are remarkably similar with the experimental data presented 
in Figures 4-7 in [57], the signals produced by different numerical models (see 
Figures 14a-14c in [10], Figures 8 and 10 in [43], Figures 10-11 in [28], Figure 6 
in [73], Figure 5 in [81]), and the glottal pulse waveforms obtained by inverse 
Altering in, e.g.. Figures 10-13 in [6], Figures 5.3, 5.4, and 5.17 in [82], [83], and 
Figures 3 and 6 in [7[. 

The simulation model does not include the neural control actions on the vocal 
fold structures or dynamic modifications of the VT geometry. There is also a 
significant control action affecting the subglottal pressure and it has been used 
as a control variable in equations (17)-(18) for glide productions. In humans, 
neural control actions are part of feedback loops, of which some are auditive, and 
some others operate directly through tissue innervation and the central nervous 
system. So little is known about these feedback mechanisms that their explicit 
mathematical modelling seems infeasible. Instead, the model parameters for 
simulations are tuned so that the simulated glottal pulse waveform corresponds 
to experimental speech data. Despite these simplifications the model appears 
to be sufficiently detailed to replicate the observations found in literature. 
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